klotz: data engineering*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. Google has introduced LangExtract, an open-source Python library designed to help developers extract structured information from unstructured text using large language models such as the Gemini models. The library simplifies the process of converting free-form text into structured data, offering features like controlled generation, text chunking, parallel processing, and integration with various LLMs.
  2. An article discussing the role of data orchestrators in managing complex data workflows, their evolution, and various tools available for orchestration.
  3. This article is part 4 of a crash course on the Model Context Protocol (MCP). It focuses on resources and prompts, explaining their mechanics, distinctions, and implementation, and how they differ from tools. It covers resource types, discovery mechanisms, and application-controlled access patterns.
  4. Keboola MCP Server enables AI-powered data pipeline creation and management. It allows users to build, ship, and govern data workflows using natural language and AI assistants, integrating with tools like Claude and Cursor. It's free to use, with costs based on standard Keboola usage.
  5. Apache Spark 4.0 marks a major milestone with advancements in SQL language enhancements, Spark Connect, reliability, Python capabilities, and structured streaming. It's designed to be more powerful, ANSI-compliant, and user-friendly while maintaining compatibility.
  6. The article discusses how Visa leverages retrieval-augmented generation (RAG) and deep learning to enhance operations. It describes Visa's 'Secure ChatGPT,' which offers a multi-model interface for secure internal use, and how RAG improves policy-related data retrieval. The article also explores Visa's data infrastructure and AI's role in fraud prevention.
    2025-03-17 Tags: , , , , by klotz
  7. This article describes a workflow using Large Language Models (LLMs) to automate the process of normalising spreadsheet data, making it tidy and machine-readable for easier analysis and insights.
  8. A deep dive into the structure and performance benefits of Parquet files, including columnar storage, partitioning strategies, and row groups.
    2025-03-14 Tags: , , , , by klotz
  9. Google has enhanced Google Sheets with an AI-powered upgrade using its Gemini technology. This update allows users to automatically convert spreadsheets into charts, identify trends, and create advanced visualizations like heatmaps. Users can interact with the Gemini feature directly through a chat interface within Sheets.
  10. Amazon S3 Batch Operations allows you to process hundreds, millions, or even billions of S3 objects efficiently. You can perform various actions such as copying objects, setting tags, restoring from Glacier, or invoking AWS Lambda functions on each object without writing custom code.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: data engineering

About - Propulsed by SemanticScuttle